With the advancement of virtual reality and 3D game technology, the demand for high-quality 3D indoor scene generation has surged. Addressing this need, this paper presents a method leveraging a VAE-GAN-based framework to conquer two primary challenges in 3D scene representation and deep generative networks. First, we introduce a matrix representation to encode fine-grained object attributes, alongside a complete graph to implicitly capture object spatial relations—effectively encapsulating both local and global scene structures. Second, we devise a unique generative framework based on VAE-GAN and the Bayesian optimization. This framework learns a Gaussian distribution of encoded object attributes through a VAE-GAN network, allowing for sampling and decoding of the distribution to generate new object attributes. Subsequently, a U-Net is employed to learn spatial relations between objects. Lastly, the Bayesian optimization module amalgamates the generated object attributes, spatial relations, and priors learned from data, conducting global optimization to generate a logical scene layout. Experimental results on a large-scale 3D indoor scene dataset substantiate that our method effectively learns inter-object relations and generates diverse and plausible indoor scenes. Comparative experiments and user studies further validate that our method surpasses the current state-of-the-art techniques in indoor scene generation and is comparable to real training scenes.
Loading....